Skip to main content
Version: V11

Ollama Embedding Node

The Ollama Embedding node converts text content into numerical vector representations using embedding models hosted on a local Ollama server. No API keys are required—models run entirely on your infrastructure, providing complete data privacy. Extensive performance tuning options control GPU/CPU resource allocation, context window sizes, and model keep-alive behavior.

How It Works

When the node executes, it receives text input from a workflow variable, sends the text to the Ollama server via HTTP requests, and returns embedding vectors as arrays of floating-point numbers. Each text input produces one embedding vector, with dimensionality determined by the model (e.g., nomic-embed-text produces 768-dimensional vectors). The node validates text content, constructs API requests with the specified model and configuration parameters, sends requests to the Ollama server, and stores the resulting vectors in the output variable.

Models must be pulled to the Ollama server before use using the ollama pull command, after which they're cached locally for immediate access. The keep-alive feature keeps models loaded in memory between requests, eliminating model loading overhead for subsequent executions and improving response times in high-frequency workflows.

Output embeddings maintain correlation with input items through unique identifiers, with each embedding traced back to its source text via UUID. The node supports model validation on initialization to catch configuration errors early. Failed embedding generation for individual items does not stop processing of other items.

Configuration Parameters

Input Field

Input Field (Text, Required): Workflow variable containing text to embed.

The node expects a list of embedding request objects where each object contains a type field (set to "text"), an optional id field (string for tracking), and a text field (string content to embed). Single objects are automatically converted to single-item lists.

Example input structure:

[
{"type": "text", "id": "doc1", "text": "First document content"},
{"type": "text", "id": "doc2", "text": "Second document content"}
]

Output Field

Output Field (Text, Required): Workflow variable where embedding results are stored.

The output is a list of EmbeddingResponse objects where each object contains a uuid field (string identifier matching input ID or generated UUID) and an embeddings field (array of floating-point numbers). The list maintains the same order as the input. Empty embeddings are returned for failed generation attempts.

Example output structure:

[
{"uuid": "doc1", "embeddings": [0.123, -0.456, 0.789, ...]},
{"uuid": "doc2", "embeddings": [0.234, -0.567, 0.890, ...]}
]

Common naming patterns: text_embeddings, document_vectors, ollama_embeddings, local_embeddings.

Model

Model (Text, Required): Ollama model name for embedding generation.

Popular models include nomic-embed-text (768 dimensions, high quality), mxbai-embed-large (1024 dimensions, state-of-the-art), and all-minilm (384 dimensions, fast). The model must be pulled to the server with ollama pull model-name. Run ollama list to see available models. Variable interpolation using ${variable_name} syntax is supported.

Base URL

Base URL (Text, Required): URL where the Ollama server is running.

Default is http://localhost:11434 for local deployments. The URL should include the protocol and port number. The server must be running and accessible before workflow execution. Variable interpolation is supported.

Validate Model on Init

Validate Model on Init (Toggle, Optional): Check if model exists on the Ollama server during initialization.

When enabled, verifies model availability before workflow execution, catching configuration errors early. Disable for production workflows where model availability is guaranteed to reduce initialization overhead.

Temperature

Temperature (Number, Optional): Sampling temperature for randomness control (0-2).

Lower values produce more deterministic outputs; higher values increase randomness. Not typically used for embeddings, as embedding generation is deterministic by nature.

Context Window Size

Context Window Size (Number, Optional): Context window size in tokens.

Larger values allow longer input texts but consume more memory. Minimum value is 1.

Number of GPUs

Number of GPUs (Number, Optional): Number of GPU layers for inference.

Set to 0 for CPU-only inference, or specify the number of GPU layers for GPU-accelerated processing. Higher values improve performance but require more GPU memory. Minimum value is 0.

Number of Threads

Number of Threads (Number, Optional): Number of CPU threads for inference.

More threads can improve performance on CPU, especially for CPU-only deployments. Minimum value is 1.

Keep Alive

Keep Alive (Number, Optional): Minutes to keep model loaded in memory after use.

Keeping models loaded eliminates loading overhead for subsequent requests. Set to 0 to unload immediately (saves memory), or higher values (30-60) for production workflows with frequent requests. Minimum value is 0.

Enable Mirostat

Enable Mirostat (Toggle, Default: false): Enable Mirostat sampling.

Not typically used for embeddings, as Mirostat is designed for text generation tasks.

Mirostat Mode

Mirostat Mode (Dropdown, Default: 1): Mirostat sampling mode when enabled.

  • 1 - Original Mirostat algorithm
  • 2 - Mirostat 2.0, generally recommended for better results

Mirostat Learning Rate

Mirostat Learning Rate (Number, Optional): Learning rate (eta) when Mirostat is enabled.

Controls how quickly the algorithm adapts. Minimum value is 0.

Mirostat Target Entropy

Mirostat Target Entropy (Number, Optional): Target entropy (tau) when Mirostat is enabled.

Controls the desired level of randomness. Minimum value is 0.

Repeat last N

Repeat Last N (Number, Optional): Tokens to look back for detecting repetition.

Not typically used for embeddings. Minimum value is 0.

Repeat Penalty

Repeat Penalty (Number, Optional): Penalty multiplier for repeated tokens.

Not typically used for embeddings. Minimum value is 0.

Tail Free Sampling

Tail Free Sampling (Number, Optional): Tail free sampling parameter.

Not typically used for embeddings. Minimum value is 0.

Top K

Top K (Number, Optional): Sample from top K tokens.

Not typically used for embeddings. Minimum value is 1.

Top P

Top P (Number, Optional): Nucleus sampling threshold (0-1).

Not typically used for embeddings.

Common Parameters

This node supports common parameters shared across workflow nodes, including Stream Output Response, Streaming Messages, and Logging Mode. For detailed information, see Common Parameters.

Best Practices

  • Pull embedding models to the Ollama server before deploying workflows using ollama pull model-name
  • Enable Validate Model on Init during development; disable in production for faster initialization
  • Configure Keep Alive to 30-60 minutes for production workflows with frequent embedding requests
  • Use Number of GPUs to control GPU memory usage—set to 0 for CPU-only inference when GPU resources are limited
  • Store Base URL in workflow variables for easy switching between development, staging, and production servers
  • Focus on Model, Base URL, Keep Alive, and GPU/CPU parameters—most other parameters (Temperature, Mirostat, sampling) are for text generation, not embeddings

Limitations

  • External server dependency: The node requires a running Ollama server. The workflow fails if the server is unreachable or not responding.
  • Model pre-pull required: Models must be pulled to the Ollama server with ollama pull model-name. The node does not download models automatically.
  • Text-only support: The node only supports text embeddings. Image embedding requests fail even though the node accepts multimodal input format.
  • No authentication support: The node does not support authentication headers or API keys. Server access must be configured at the network level.
  • Memory requirements: Keeping models loaded (Keep Alive) consumes server memory. Monitor memory usage when running multiple or large models.
  • Network latency: Embedding performance depends on network latency between the workflow engine and Ollama server. Co-locate them when possible.